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Future parallel computers must efficiently execute not only hand-coded applications but 
also programs written in high-level, parallel programming languages. Today's machines 
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limit tinese programs to a single communication paradigm, either message-passing or 
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Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in 
shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting 
tend to produce large amounts of memory and interconnect contention, introducing 
performance bottlenecks that become markedly more pronounced as applications scale. 
We argue that this problem is not fundamental, and that one can in fact construct busy- 
wait synchronization algorithms that induce no memory or interc ... 
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We introduce Minos, a microarchitecture that implements Biba's low-water-mark integrity 
policy on individual words of data. Minos stops attacks that corrupt control data to hijack 
program control flow but is orthogonal to the memory model. Control data is any data 
which is loaded into the program counter on control flow transfer, or any data used to 
calculate such data. The key is that Minos tracks the integrity of all data, but protects 
control flow by checking this integrity when a program use ... 
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For large caches, the interaction between cache access and address translation affects the 
machine cycle time and the access time to memory. The physically addressed caches slow 
down the cache access due to the virtual address translation. The virtually addressed 
caches is faster, but the synonym problem is difficult to handle. By some software 
constraints and hardware support, our virtually addressed physically tagged caches can 
achieve the same speed as traditional virtually addressed cac ... 
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Multiprocessors with non-uniform nnemory access times introduce the problem of placing 
data near the processes that use them, In order to improve performance. We have 
implemented an automatic page placement strategy in the Mach operating system on the 
IBM ACE multiprocessor workstation. Our experience indicates that even very simple 
automatic strategies can produce nearly optimal page placement. It also suggests that the 
greatest leverage for further performance improvement lies in reducing ... 
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This paper compares eight commercial parallel processors along several dimensions. The 
processors include four shared-bus multiprocessors (the Encore Multimax, the Sequent 
Balance system, the Alliant FX series, and the ELXSI System 6400) and four network 
multiprocessors (the BBN Butterfly, the NCUBE, the Intel iPSC/2, and the FPS T Series). 
The paper contrasts the computers from the standpoint of interconnection structures, 
memory configurations, and interprocessor communication. Also, the share ... 
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This paper discusses implennentations of fine-grain nrtemory access control, which 
selectively restricts reads and writes to cache-block-sized memory regions. Fine-grain 
access control forms the basis of efficient cache-coherent shared memory. This paper 
focuses on low-cost implementations that require little or no additional hardware. These 
techniques permit efficient implementation of shared memory on a wide range of parallel 
systems, thereby providing shared-memory codes with a portability ... 
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annual international symposium on Computer architecture ISCA '89, volume 
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terms 

The problem of building a scalable shared mennory multiprocessor can be reduced to that 
of building a scalable memory hierarchy, assuming interprocessor communication is 
handled by the memory system. In this paper, we describe the VMP-MC design, a 
distributed parallel multi-computer based on the VMP multiprocessor design, that is 
intended to provide a set of building blocks for configuring machines from one to several 
thousand processors. VMP-MC uses a memory hierarchy based on shared caches ... 

Coupling compiler-enabled and conventional memory accessing for energy efficiency Q 
Raksit Ashok, Saurabh Chheda, Csaba Andras Moritz 

May 2004 ACM Transactions on Computer Systems (TOCS), volume 22 issue 2 
Publisher: ACM Press 
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This article presents Cool-Mem, a family of memory system architectures that integrate 
conventional memory system mechanisms, energy-aware address translation, and 
compiler-enabled cache disambiguation techniques, to reduce energy consumption in 
general-purpose architectures. The solutions provided in this article leverage on interlayer 
tradeoffs between architecture, compiler, and operating system layers. Cool-Mem achieves 
power reduction by statically matching memory operations with energy-eff ... 

Keywords: Energy efficiency, translation buffers, virtually addressed caches 
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terms, review 

Distributed operating systems have many aspects in common with centralized ones, but 
they also differ in certain ways. This paper is intended as an introduction to distributed 
operating systems, and especially to current university research about them. After a 
discussion of what constitutes a distributed operating system and how it is distinguished 
from a computer network, various key design issues are discussed. Then several examples 
of current research projects are examined in some detail ... 
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terms 

Many operating systems allow user programs to specify the protection level (inaccessible, 
read-only, read-write) of pages in their virtual memory address space, and to handle any 
protection violations that may occur. Such page-protection techniques have been exploited 
by several user-level algorithms for applications including generational garbage collection 
and persistent stores. Unfortunately, modern hardware has made efficient handling of 
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page protection faults more difficult. Moreover, page- ... 
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The Mneme project is an investigation of techniques for integrating programming language 
and database features to provide better support for cooperative, information-intensive 
tasks such as computer-aided software engineering. The project strategy is to implement 
efficient, distributed, persistent programming languages. We report here on the Mneme 
persistent object store, a fundamental component of the project, discussing its design and 
initial prototype. Mneme stores objects 
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This paper presents an overview of the Cedar programming environment, focusing on its 
overall structure— that is, the major components of Cedar and the way they are organized. 
Cedar supports the development of programs written in a single programming language, 
also called Cedar. Its primary purpose is to increase the productivity of programmers 
whose activities include experimental programming and the development of prototype 
software systems for a high-performance personal computer. T ... 
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